Rank in Wordlist | Word | Rank in Wordlist | Word |
---|---|---|---|
1 | the | 26 | not |
2 | of | 27 | at |
3 | to | 28 | can |
4 | and | 29 | an |
5 | a | 30 | will |
6 | in | 31 | your |
7 | is | 32 | we |
8 | that | 33 | all |
9 | for | 34 | viruses |
10 | it | 35 | which |
11 | The | 36 | but |
12 | on | 37 | they |
13 | be | 38 | has |
14 | with | 39 | one |
15 | you | 40 | he |
16 | are | 41 | his |
17 | as | 42 | more |
18 | was | 43 | their |
19 | by | 44 | This |
20 | virus | 45 | had |
21 | this | 46 | computer |
22 | or | 47 | A |
23 | I | 48 | would |
24 | from | 49 | about |
25 | have | 50 | some |
The table shows the top-50 words of the corpus. Usually we see stopwords.
Language: Afrikaans
This list is a good candidate for a first stopword list for a language.
Usually a small, balanced corpus is enough to get a good list of high frequent words. But if the small corpus has some very prominent topic, this will be visible even in the top word lists.
select w_id-100 as rank_in_wordlist, word from words where w_id>100 order by w_id limit 50;
3.4 Sample words for different frequency ranges